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This paper adresses the behaviour of the mutual information of correlated MIMO Rayleigh channels 
' when the numbers of transmit and receive antennas converge to +00 at the same rate. Using a new 

and simple approach based on Poincare-Nash inequality and on an integration by parts formula, it is 
rigorously established that the mutual information converges to a Gaussian random variable whose mean 
and variance are evaluated. These results confirm previous evaluations based on the powerful but non 
rigorous replica method. It is believed that the tools that are used in this paper are simple, robust, and 
of interest for the communications engineering community. 
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I. Introduction 



^ . It is widely known that high spectral efficiencies are attained when multiple antennas are used at both 

the transmitter and the receiver of a wireless communication system. Indeed, due to the mobility and 
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to the presence of a large number of reflected and scattered signal paths, the elements of the N x n 
Multiple Input Multiple Output (MIMO) channel matrix with N antennas at the receiver's site and n 
antennas at the transmitter's are often modeled as random variables. Assuming a random model for this 
matrix, Telatar realized in the mid-nineties that Shannon's capacity of such channels increases at the rate 
of min(A, n) for a fixed transmission power [1]. A result of the same nature can be found in the work 
of Foschini and Gans [2]. The authors of [1] and [2] assumed that the elements of the channel matrix 
G are centered, independent and identically distributed (i.i.d.) elements. In this context, a well known 
result in Random Matrix Theory (RMT) [3] says that the eigenvalue distribution of the Gram matrix 
GG* where G* is the Hermitian adjoint of G converges to a deterministic probability distribution as n 
goes to infinity and N/n converges to a constant c > 0. Denote by I(p) = logdet (|GG* + Ijv) the 
capacity of channel G for a Signal to Noise Ratio at a receiver antenna equal to p/n. One consequence 
of [3] is that the capacity per transmit antenna I(p)/n, being an integral of a log function with respect 
to the empirical eigenvalue distribution of GG*, converges to a constant. This fact already observed in 
[1] sustains the assertion of the linear increase of capacity with the number of antennas. In addition, this 
convergence proves to be sufficiently fast. As a matter of fact, the asymptotic results predicted by the 
RMT remain relevant for systems with a moderate number of antennas. 

The next step was to apply this theory to channel models that include a correlation between paths (or 
entries of G). One of the main purposes of this generalization is to better understand the impact of 
these correlations on Shannon's mutual information. Let us cite in this context the contributions [4], 
[5], [6], [7] and [8], all devoted to the study of the mutual information in the case where the elements 
of channel's matrix are centered and correlated random variables. In [9], a deterministic equivalent is 
computed under broad conditions for the capacity based on Rice channels modeled by non-centered 
matrices with independent but not identically distributed random variables. The link between matrices 
with correlated entries and matrices with independent entries and a variance profile is studied in [10]. 

One of the most popular correlated channel models used for these capacity evaluations is the so-called 
Kronecker model G = where W is a N x n matrix with Gaussian centered i.i.d. entries, and \I/ 

and * are NxN and nxn matrices that capture the path correlations at the receiver and at the transmitter 
sides respectively [11], [12]. This model has been studied by Chuah et. al. in [5]. With some assumptions 
on matrices * and 4/, these authors showed that I{p)/n converges to a deterministic quantity defined 
as the fixed point of an integral equation. Later on, Tulino et. al. [8] obtained the limit of I{p)/n for a 
correlation model more general than the Kronecker model. Both these works rely on a result of Girko 
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describing the eigenvalue distribution of the Gram matrix associated with a matrix with independent but 
non necessarily identically distributed entries, a close model as we shall see in a moment. 
In [7], Moustakas et. al. studied the mutual information for the Kronecker model by using the so-called 
replica method. They found an approximation V(p) of E [I(p)] accurate to the order 1/n in the large n 
regime. Using this same method, they also showed that the variance of I(p) — V(p) is of order one and 
were able to derive this variance for large n. 

Although the replica technique is powerful and has a wide range of applications, the rigorous justification 
of some of its parts remains to be done. In this paper, we propose a new method to study the convergence 
of E/(p) and the fluctuations of I(p). Beside recovering the results in [7], we establish the Central Limit 
Theorem (CLT) for I(p)—V(p). The practical interest of such a result is of importance since the CLT leads 
to an evaluation of the outage probability, i.e. the probability that I(p) lies beneath a given threshold, by 
means of the Gaussian approximation. Many other works have been devoted to CLT for random matrices. 
Close to our present article are [13], [14], [15]. 

In this article, we also would like to advocate the method used to establish both the approximation of 
I(p) in the large n regime and the CLT. Due to the Gaussian character of the entries of Matrix G, two 
simple ingredients are available. The first one is an Integration by parts formula (|T6T > that provides an 
expression for the expectation of certain functionals of Gaussian vectors. This formula has been widely 
used in RMT [16]— [18]. The second ingredient is Poincare-Nash inequality dTTb that bounds the variance 
of functionals of Gaussian vectors. Although well known [19], [20], its application to RMT is fairly 
recent [18]. This inequality enables us to control the decrease rate of the approximation errors such as 
the order 1/n error E [/(/?)] — V(p). We believe that these tools which prove to be simple and robust 
might be of great interest for the communications engineering community. 

The paper is organized as follows. In Section [TTJ we introduce the main notations; we also state the two 
main results of the article. In Section [Till we recall general matrix results and the two aforementioned 
Gaussian tools. Section [TV] is devoted to the proof of the first order result, that is the approximation of 
E[/(p)]. The CLT, also refered to as the second order result, is established in Section [V] Proof details 
are in an appendix. 

II. Notations and Statement of the main results 

A. From a Kronecker model to a separable variance model. 

Consider a MIMO system represented by a N x n matrix G where n is the number of antennas at the 
transmitter and N is the number of antennas at the receiver and where N(n) is a sequence of integers 
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such that 

N(n) 
lim — — = c> 0. 

n^oo n 

Assuming the transmitted signal is a Gaussian signal with a covariance matrix equal to (and thus, a to- 
tal power equal to one), Shannon's mutual information of this channel is I n (p) = log det (^GG* + I/v) , 
where p > is the inverse of the additive white Gaussian noise variance at each receive antenna. The 
general problem we address in this paper concerns the behaviour of the mutual information for large values 
of TV and n in the case where the channel matrix G, assumed to be random, is described by the Kronecker 
model G = ^W^. In this model, * and * are respectively N x N and n x n deterministic matrices 
and W is random with independent entries distributed acccording to the complex circular Gaussian law 
with mean zero and variance one CM(0, 1). 

It is well known that this model can be replaced by a simpler Kronecker model involving a matrix 

i_ 

with Gaussian independent (but not necessarily identically distributed) entries. Indeed, let * = UD T 2 t V* 
(resp. * = UDnV*) be a Singular Value Decomposition (SVD) of * (resp. *), where D n (resp. D n ) 
is the diagonal matrix of eigenvalues of *!"]>* (resp. ), then I n (p) writes: 

/„(p) = logdet(^Y„Y: + I JV ), 

where Y n = D^X n D^ is a N x n matrix, D„ and D„ are respectively N x N and n x n diagonal 
matrices, i.e. 

D n = diag (d[ n \ 1 < i < Nj and D n = diag (df\ 1 < j < nj , 

and X„ = V*WU has i.i.d. entries with distribution CN(0, 1) since V and U are deterministic unitary 
matrices. Since every individual entry of Y n has the form Y^ 1 = \J d^d^Xij, we call Y„ a random 
matrix with a separable variance profile. 

B. Assumptions and Notations. 

o 

The centered random variable X — E[X] will be denoted by X. Element (i, j) of a matrix A will be 
either denoted [A]jj or A\j. Element i of vector a will be denoted a, L or [a]j. Column j of matrix A will 
be denoted aj. The transpose, the Hermitian adjoint (conjugate transpose) of A, and the matrix obtained 
by conjugating its elements are denoted respectively A T , A*, and A. The spectral norm of a matrix A 
will be denoted ||A||. If A is square, trA refers to its trace. Let i = \/^T, then the operators d/dz and 
d/dz where z = x + iy is a complex number are defined by = \ ( ^ — i J and ^ = ^(^ + i^J 
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where and are the standard partial derivatives with respect to x and y. 

Throughout the paper, notation K will denote a generic constant whose main feature is not to depend 
on n. In particular, the value of K might change from a line to another as long as it never depends 
upon n. Constant K might depend on t G R + and whenever needed, this dependence will be made more 
explicit. 

As usual notation a n = 0{f3 n ) is a flexible shortcut for \a n \ < K(3 n and a n = o{(3 n ), for a n = e n (3 n 
with e n — > as n goes to infinity. 

In order to study a deterministic approximation of I n (p) and its fluctuations, the following mild assump- 
tions are required over the two triangular arrays (d^ , 1 < i < N, n > lj and (d^ , 1 < j < n, n > l^j 

(Al) The real numbers d\ n ^ and $^ are nonnegative and the sequences (d^^j and (d^^j are 
uniformly bounded, i.e. there exist constants ci max and d max such that 

sup ||D n || < d max and sup ||D n || < d max . 

n n 

where ||D n || and ||D n || are the spectral norms of D n and D n . 
(A2) The normalized traces of D n and D ra satisfy 



inf -tr (D n ) > and inf -tr ( V n ) > 0. 

n n n n V / 



n n n 

In the sequel, we shall frequently omit the subscript n and the superscript (n). 

The resolvent associated with ^Y n Y^ is the N x N matrix H n (t) = (^Y n Y; + Ijv) -1 . Of prime 
importance is the random variable (3{t) = ^trDH(t) and its expectation a(t) = ^trDEH(t). 
We furthermore introduce the n x n deterministic matrix defined by 

R(t) = (l + to(t)D n ) \ 

= diag (fi(t), 1 < j < n) where fJt) = — , 

W ~ ' 3 l + ta(t)dj 

and the related quantity a(t) = ^trDR(t). In a symmetric fashion, the N x N matrix R(t) is defined 

by 

R(t) = (I + ta(t)D n ) _1 , 

= diag(rj(t), 1 < i < N) where ri{t) 



1 + ta(t)di 

We finally introduce the solutions of a deterministic 2x2 system. 
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Proposition 1: For every n, the system of equations in (<5, 6) 

5 = itrD n (I + t5D n )- 1 

6 = ±trD ri (I + i^Dn)- 1 

admits a unique solution |^ n (t), <5 n (i)J satisfying <5 n (i) > 0, <5 n (i) > 0. Moreover, there exist nonnegative 
measures fi n and ji n over M + such that 

nV ' j R+ i+tx nyj y R+ i+tx ' 

where /x n (R+) = ±trD n and /i n (M + ) = ±trD n . 
The proof is postponed to Appendix |A] 

With 5 and <5> properly defined, we introduce the following N x N and n x n diagonal matrices: 

T = (I + t£D) _1 and T = (1 + t<fD) -1 . 

Notice in particular that S = ^trDT and 6 = ^trDT by £T|). We finally introduce the following 
quantities which are required to express the fluctuations of I n (p)'- 

7 „(t) = £trD 2 T 2 (i) 

7n(t) = ^trDM(t) 



< 



Proposition 2: Assume that Assumptions (Al) and (A2) hold and denote by 

al (t) = - log (1 - t 2 ln {t)%{t)) , t > (4) 

where "f n (t) and j n (t) are given by Q. Then cr 2 (i) is well-defined, i.e. 1 — t 2 7 n (t)7n(i) > for t > 0. 
Moreover there exist nonnegative real numbers m t and such that 

< m 2 t < infer 2 (t) < sup cr 2 (t) < M t 2 < oo for i >0 . (5) 

n n 

Moreover, a 2 (t) is upper-bounded uniformly in n and i for i £ [0, p], i.e. sup t<p M 2 < oo. 

Proof of Proposition |2] is postponed to Appendix |B] 

Summary of the main notations. 

In order to improve the readability of the paper, we gather all the notations in Table III-BI As expressed 
there, there are three kinds of quantities: 
1) Random quantities, 
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Deterministic quantities 


Random quantities 












depending on the law of YY* via E 


only depending on the variance structure via D and D 


H =(1YY*+I)- 1 






= J-trDH 


Ol = — trlJ IrLri 

n \ / 

fj = (l+tadj)- 1 


o = — trJJ^l toD) = — trJJ ± 




R = {I + taD)- 1 


T = (I + ^D)" 1 




a — ^trDR = ^trD(I + taD) 


C 1 j "I — -v / T i fT\ \ — 1 1 i "1 — ^ r ■ i 

8 — ^trD(I + toD) = ^trDT 




n = (1+tadi)- 1 






R = (I + taD)" 1 


T = (I + ^D)" 1 

7 = itrT 2 D 2 , 7 = itrT 2 D 2 

a 2 (t) = -log(l-f 2 7 (i)7W) 



TABLE I 

SUMMARY OF THE MAIN NOTATIONS 



2) Deterministic quantities depending on the law of YY* via the expectation E with respect to the 
entries of Y, 

3) Deterministic quantities which only depend on the matrices D and D, sometimes via 5 and S (as 
defined in Proposition [Q) which are easily computable. 

The main goal of the forthcoming computations will be to approximate elements of the first and second 
kind by elements of the third kind. 

C. Statement of the main results. 

We now state the main results. Theorem Q] describes the first order approximation of the Shannon 
capacity I n (p) while Theorem [2] describes its fluctuations when centered with respect to its first order 
approximation. 

Theorem 1: Let X be a N x n matrix whose elements Xij are independent complex Gaussian variables 
such that 

E(Xy) = Epfg) = 0, EflXyl 2 ) = 1, 1 < i < N, 1 < j < n, 
and Y = D2XD2 where the diagonal matrices D and D satisfy Assumptions (Al) and (A2). Let 
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I n (p) = bgdet (fYY* + l N ). Then, we have 

Wn(p)] = V n {p) + o(^j (6) 
as n — > oo, iVn -1 — ► c s]0, oo[ where 

V n (p) = log det (i + p<5 n (p)D n ) + log det (i + p5 n (p)D n ) - npJ n (p)J n (p) . 

and where (6 n (t) , 5 n (t)) is the unique positive solution of the system 

S = itrDp + ttfD)- 1 
* = ^trD^ + t^Dn)- 1 

Theorem 2: Assume that the setting of Theorem Q] holds and let cr^(p) = — log (l — p 2 J n {p)ln{p)) ■ 
Then the random variable a^ 1 (p)(I n (p) — V n (p)) converges in distribution towards jV(0, 1) where 

7n (p) = ^rD^T^ (p) ^ f T(p) = (I + p^D)- 1 
^ 7„(p) = itrfi^T^ (p) an \ T(p) = (I + P5V)- 1 ' 

III. Mathematical Tools and Some Useful Results 

In this section, we present the tools we will use extensively all along the paper. In Section IIII-AI we 
recall well known matrix results; in Section IIII-BI we present two fundamental properties of Gaussian 
models: The Integration by parts formula and Poincare-Nash inequality for Gaussian vectors. Section lTlI-CI 
is devoted to a cornerstone approximation result which roughly states that R and R can be replaced by T 
and T up to some well-quantified error. In Section IIII-DI various variance estimates and approximation 
rules are stated. 

A. General results 

1 ) Some matrix inequalities: Let A and B be two N x N matrices with complex elements. Then 



|tr (AB)| < v/tr(AA*)Vtr (BB*) . (7) 
Assuming A is Hermitian nonnegative, we have 

|tr (AB)| < ||B|| tr (A) , (8) 
where |.| is the spectral norm (see [21]). 
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2) The Resolvent: The Resolvent matrix H n (t) of matrix Y„Y* is denned as H n (f) = Y n Y* + Ijv) 
It is of constant use in this paper and we give here some of its properties. The following identity, also 
known as the Resolvent identity: 

H(t) = In - -H(t)YY* (9) 

n 



-i 



follows from the mere definition of H. Furthermore, the spectral norm of the resolvent is readily bounded 
by one: 

||H(t)|| < 1 for t>0. (10) 

3) Bounded character of the mean of some empirical moments: Let (B n ) ng p!j = diag ( , . . . , b^f 1 J , 
n G N, be a sequence of deterministic n x n diagonal matrices. Assume (Al), and furthermore, that 
sup n ||B n || < oo. Then for every integer k, we have 



1 

— E 

n 



iv ( -YBY' 

n 



< K . 



(11) 



Let us sketch a proof. Expanding the left hand side of (fTTT) yields: 
1 



n 



k+l 



E 



^3i^j2 ■ ■ ■ \ ^? 2 jl ^2^2 



j 1 ,...,j k = l:n 

A close look at the argument of the E operator implies that due to the independence of the Y^, we only 
have k + l degrees of freedom in the choice of the indices i p and j q . As all moments of the Gaussian 
law exist and moreover ||B n ||, ||D ra ||, and ||D n || are bounded, this sum is of order 1 as n — > oo. 



4) Differentiation formulas: Let A be a N x N complex matrix and let Q(A) = (Ijv + A) . Let 
SA be a perturbation of A. Then 



Q(A + SA) = Q(A) - Q(A) SA Q(A) + o(\\SA\ 



(12) 



iJV.JV 



where o (||<5A||) is negligible with respect to \\SA\\ in a neighborhood of 0. Writing H(i) = [H pq (t)] q=1 , 
we need the expression of the partial derivative dH pq / dYij. Using (fT2l . we have: 



dH„ 



dYi 



dYi 



'.J 



pq 



t 

n 



H 



S(k - i)Y tj 



N 



H 



kl=l 



pq 



t 



Hpi[y*H\q , 



n JH n 

where S is the Kronecker function. Similarly, we can establish 



dH, 



t 



dYij 



n 



pj l i 



t 



■n 



iq 



(13) 



(14) 
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The differential of g(A) = logdet(A) is given by g(A + SA) = g(A) + tr (A -1 8 A) + o (||5A||) . 
We use this equation to derive the expression of dI(t)/dYij also needed below: 

_2L = *tr ( H^2H = -tr ( H [ 8(1 - j)Y kj ] " )=- [HY] . . = - [HyJ . . (15) 
B. Gaussian tools 

1) An Integration by parts formula for Gaussian functional: Let £ = , £m] T be a complex 

Gaussian random vector whose law is determined by E[£] = 0, E[££ T ] = 0, and E[££*] = S. Let T = 
• • • > Cm, £l> ' ' " , Cm) be a C 1 complex function polynomially bounded together with its derivatives, 

then: 

M 

E&r(£)] = £[s] pm E 

m=l 

This formula relies on an integration by parts and thus is referred to as the Integration by parts formula 
for Gaussian vectors. It is widely used in Mathematical Physics [22] and has been used in Random Matrix 
Theory in [16], [17]. 



(16) 



2) Poincare-Nash inequality: Let £ and V be as previously and let V Z T = [dT/dzx, . . . ,dT/dzM] T 
and V-T = [dT /dzi, . . . , dT /&zm\ t . Then the following inequality holds true: 



var(r(0)<E v 2 r(0 T sv z r(0 + E[(v-r(£))* hv^)] . 



(17) 



This inequality is well-known (see e.g. [19], [20]) and has first been applied to Random Matrix Theory 
in [18]. 

When £ is the vector of the stacked columns of matrix Y, i.e. £ = [In, 
becomes: 

~dT(Y)~ 



E [Fy-r(Y)] = didjK 



,Y Nn ] T , formula (O 
(18) 



while inequality (fTTT ) writes: 

N n 

var(T(Y)) < ^2^2 didjK 
i=i j=i 



ar(Y) 



dY 



+ 



ar(Y) 



dY, 



(19) 



Poincare-Nash inequality turns out to be extremely useful to deal with variances of various quantities 
of interest related with random matrices. For the reader's convenience, we provide a proof in Appendix 
Oand in order to give right away the flavour of such results, we state and prove the following: 
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Proposition 3: Assume that the setting of Theorem Q] holds and let A n be a N x N real diagonal 
matrix which spectral norm is uniformly bounded in n. Then 

-2\ 



var 



n 



-trAH | = O 

n J - 

Proof: We apply inequality (fl9l ) to the function T(Y) = -trAH. Using (fT3T ). we have 



9lT 



1 * 



p=i M 



[y*HAH] 



2 lJ 3 



Therefore, denoting by A the upper bound A = sup n ||A n || and noticing that \3T/dYij\ = \dT/dYi j\, 



we have: 



2t 



varr(Y) < ^EE^ 1 [^* HAH L 



2t 



i=l j=l 



F Jd j E(yjHAHDHAHy j 



24 



— Etr HAHDHAH 



YDY* 



< — 



-E < II HI 



IDII tr 



71 / 

/ YDY* 



n 



( fe ) 2A 2 d max t 2 / YDY* 

< ^^Etr 

n 4 \ n 



< — 



where inequality (a) follows from ([8]), (6) follows from (fTOb and from the bounded character of ||A n | 
and [|D n [|, and (c) follows from (ITTb . I 



C. Approximation rules 

The following theorem is crucial in order to prove Theorems Q] and |2] Roughly speaking it allows to 
replace matrices R and R by T and T up to a well-quantified small error. 

Theorem 3: Let (A„) and (B„) be two sequences of respectively N x N and n x n diagonal 
deterministic matrices whose spectral norm are uniformly bounded in n, then the following hold true: 

-trAR = itrAT + O f 4r ) , (20) 

n n \n z J 

-tiBR. = itrBT + f 4t J . (21) 

n n \n' L J 

Proof of Theorem [3] is postponed to Appendix |Dl 

D. More variance estimates and more approximations rules 

We collect here a few results which proofs rely on the Integration by parts formula (TT8T ). on Poincare- 
Nash inequality, and on Theorem [3] The proofs of these results, although systematic, are somewhat 
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lengthy and are therefore postponed to the Appendix. These results will be used extensively in Section 

El 

Proposition 4: In the setting of TheoremQ] let A n and B n be uniformly bounded real diagonal matrices 

of size N x N and n x n. Consider the following functions: 

1 / vRY* \ 1 / vRV' 

#(Y) = -tr AH J, f(Y) = -tr AHDH- 



n \ n J n \ n 

Then, 

1) The following inequalities hold true: 

var (*(Y)) = 0(n- 2 ), var (*(Y)) = C(n" 2 ) . 

2) The following approximations hold true: 

E[$(Y)1 = — tr (dTb) — tr (ADT) + O (n~ 2 ) , (22) 
n \ / n 

E[*(Y)] = r -^-r (^tr(DTB) tr (AD 2 T 2 ) - gtr (d 2 T 2 b) tr (ADT)) + O (-^3) 
The variance inequalities are proved in Appendix [0 the approximation rules, in Appendix [0 

IV. First Order Moment Approximation: Proof of Theorem Q] 
This section is devoted to the proof of the following approximation: 

E[I n (p)] = V n (p) + 0(n~ 1 ) , (24) 

where 

V n {p) = log det (i + p<5 n (p)D n ) + log det (i + p5 n (p)D n ) - np5 n (p)5 n (p) . (25) 

This result already appears in [7] and is proved under greater generality in [9]. The proof presented here 
is new and relies on gaussian tools. 

Outline of the proof 

The proof is divided into three steps. We first make some preliminary remarks. Notice that the mutual 
information can be expressed as I(p) = J P tr (n _1 H(i)YY*) dt. In particular, 



E[/(p)]= tr E 



o 



YY* 
H(i)- 



n 



dt . (26) 
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In order to study the asymptotic behaviour of E [J(p)], it is thus enough to study tr (H(i)-^- 
n — > — |— oo up to an integration. The Resolvent identity (O yields 



for 



trE H(t) 



YY" 



trE 



I-H(t) 



n J \ t 

We are therefore led to the study of E [tr(H(t))]. We now describe the three steps of the proof. 
A. In the first part of the proof, we expand EH(t) with the help of the Integration by parts formula 
(|T8T >- This derivations will bring to the fore the deterministic diagonal matrix R, and Poincare-Nash 
inequality will then allow us to obtain the following approximation: 

EtrAH = trAR + O (rjT 1 ) , 

for every diagonal matrix A bounded in the spectral norm. Here are the main steps, gathered in 
an informal way. Differentiating the term E ( [Hyj] Y p j J , we obtain: 



E {\Hyj] p Ypj) = d p d 3 E [H pp ] - tdjE ^-tr(DH) [H yj } p Y pj 

from which we will extract E[i7 pp ] later on. At this point, Poincare-Nash inequality yields some 
decorrelation up to O (n^ 1 ) and we obtain: 



E 



-tr(DH) (U yj ) p Y p 



n 



VJ 



~ E 



-tr(DH) 



n 



E 



aE 



This approximation allows us to isolate E ( [Hyj] Y p j 



[Hyj] P Y Pj 



(1 + tdja)E (jHy^ Y P j) ~ d p djE [H pp ] E ([H yj } p Y pJ ) ~ dpdjfjE [H pp ] . 

Now summing over j and using the Resolvent identity EH pp = 1 — ^ Y^j=i ^ [Hyj] Y p j in the 
previous equation yields: 



1 - EH PP 



ad p EH pp , that is EH pp 



All the technical details are provided in Section HV-Aj 



B. The second step follows from the approximation rule (l20l ) stated in Section IlII-C I which immediatly 
yields 

EtrAH = trAT + Q (n^ 1 ) . 



This in turn will imply that 
YY* 



Etr K(t)- 



n 



tr 



I -EH 

t 



i r I — — I + e n (t) = n5(t)S(t) + e n (t). 



15 December 2006 



DRAFT 



SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 

where (a) follows from the fact that I - T = i£D(I + i^D) -1 . 



14 



C. In the third step, we integrate the previous equality: 



p / YY* \ f' p f p 

Etr H(t) )dt = n 6(t)6(t)dt+ e n (t)dt. 

o V n ) J Jo 



We identify n J Q P 5(t)S(t)dt with V n (p) as given by d2"5i and check that f Q p e n (t)dt = e>(n _1 ). 

A. Development of E(trAH(i)) and Approximation by trAR(t) 

In order to study E (trAH(i)), we first consider the diagonal entries H pp (t) of H(i). For each index 
j, we have 

JV 

8=1 

We now apply the Integration by parts formula (fT8l ) to the summand of the right hand side for function 
T defined as T(Y) = H p iY p j. This yields: 

(27) 



E (HpiYijYp,) = didjE [H u ] 5{i - p) - d^-E ([H yj } p H u Y pj 



Therefore, 

E ([H yi ] p ^~) = d p d 5 E [H pp ] - td 3 E f^tr(DH) [H yj } p Y~\ , (28) 
from which we sahh extract E[i7 pp ] later on. Recall at this point that var (n _1 trDH(i)) = O (n~ 2 ) 

o 

by Proposition [3] Recall also the following notations: (3 = n _1 tr(DH), a = E [f3] , and j3 = j3 — a. 

o 

Plugging the relation j3 = a + j3 into d28l ), we get 



E 



[Hy^i *W = dpdjElHpp] - tdjaE [HyA Y Pt 



tdjE 



Solving this equation w.r.t. E [[Hyj]pT^j] provides: 
E[[H yi py = d p d j r j E[H pp ]-td j r j E °f3[*ly j } p Y~ 

Summing (l30l over j yields: 



where fj(t) 



1 + ta(t)dj 



E 



H 



YY* 



n 



ad p E[# pp ] - tE/3 



i'i> 



H 



YDRY* 



n 



(29) 



for 1 < j < n . 

(30) 



(31) 



where R is the diagonal matrix diag (fj(i)) = ^1 + atDj and a = ^trDR. In order to obtain an 
expression for E[i7 pp ], we plug the identity ( f3TT > into the Resolvent identity: 



-i 



E[Hj 



ppi 



1-tE 

















n 








pp_ 
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and obtain: 



E [H pp ] =r p + t 2 r p E 









o 


YDRY* 




p 


H 




n 








pp_ 



(32) 



with r p (t) = (1 + tadp) 1 . Let A be a N x N diagonal matrix with bounded spectral norm. Multiplying 
(l32l by A's components and summing over p yields: 



Etr(AH) = tr(AR) + nt 2 K 



/3$(Y) 



where 3>(Y) = ^tr(ARH YD ^ Y * ). As (3 is zero-mean, E[/3$] = E [/?<!>]. In particular, Cauchy-Schwarz 
inequality yields: 

|E)9l| < V v ar(/?)V v ar($)- 

Recall that var(/3) = O (?^ _2 ) by Prop. [3] Since ||R n || and ||D„R ra || are both bounded by Assumption 
(Al) and by the definitions of R„ and R n , one can directly apply the result of Proposition [4] to <E> in 
order to get var(<£) = O (n~ 2 ). 

We have therefore proved the following: 

Proposition 5: In the setting of Theorem [T] let A be a uniformly bounded diagonal N x N matrix. 
Then for every t G M + , 

E(trAH(t)) = trAR(t) + O (n^ 1 ) . (33) 



B. The Deterministic Approximation T(i). 

Proposition [5] provides a deterministic equivalent to E (trAH) since matrix R is deterministic; however 
its elements still depend on a = n _1 tr(DR), which itself depends on a = E (n~ 1 trDH), an unknown 
parameter. The next step is therefore to apply Theorem [3] to approximate matrix R by T, which only 
depends on D and D and and on 5 and 5, the solutions of (Q]). Theorem [3] together with Equation (l33l) 
imply that: 

E(trAH) = tr(AT) + O (n^ 1 ) . (34) 

Since T only depends on 6 and 5, (l34l provides a deterministic equivalent of E(trAH) in terms of 5 and 
5. Note that taking A = D yields in particular a = 5 + 0(n~ 2 ) while a direct application of Theorem 
[3] for A = D yields a = 5 + 0{n~ 2 ). 
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We are now in a position to describe the behaviour of Etr (H(t)^-) by using the Resolvent identity. 
From (|9]) and (l34b . taking A = I, we immediately obtain: 



/ YY* \ 1 
EtrfH(t) — —J = -tr(I-T(i)) + 0(n~ 1 ) 



As I - T(t) = (Tit)' 1 - I)T(t) = t6(t)BT(t), we eventually get that 

YY* 



E 



tr H(t)- 



where the error £ n (t) is a C(n ) term 



n 



n5{t)5{t) + e n {t), (35) 



C. Recovering the Deterministic Approximation V(p) ofM[I(p)]. 

As mentionned previously, e n (t) is a term, i.e. |e n (*)| < K t n~ l . One can easily keep track 

of K t in the derivations that lead to (f35l > and prove that K t is bounded on the compact interval [0, p]. 
In particular, |e n (i)| < Kn^ 1 on the compact interval [0, p] for some K > 0. The proof of this fact is 
omitted. 

As e n (t) is uniformly bounded on [0, p], we have | J* P e n (t)dt\ = C(n _1 ). Therefore, 

®[I(j>)] = I nS(t)S(t) + O (n- 1 ) . 

Consider now 

f(p) ir(/>.*/,M(/» 

where function W(p, 5, 6) is defined by 

w(p,5,tf) = logdet (I + p<5D) +logdet ( I + p<5D ) - np<W 



One can easily check that: 

p(tr (D(J + p<5D) _1 ) - n£) and = p (tr (d(J + pSB)' 1 ) - n5 



85 r V V v r > ) ) dS 
As the pair (5(p),5(p)) satisfies (fl}, the above partial derivatives evaluated at point (p,5(p),5(p)) are 
zero. Therefore, 

-] =n5{p)5(p) (36) 



d P V 9P J ( P ,5( P ),S( P )) 
which in turn implies ©. Theorem [T] is proved. 



Remark 1 (On the deterministic approximation T): The deterministic approximation T can be used to 
approximate functionals of the eigenvalues of YY* other that the mutual information log det(pn _1 YY* + 
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/) (see for instance [9]). This relies on a specific representation of T: The spectral theorem for Hermitian 
matrices yields the integral representation: 

itrH n ( 2 )= r^M, zGC\M_, 
n J 1 + \z 

where N n represents the empirical distribution of the eigenvalues of YY*. It can be shown that n~ 1 trT 
admits a similar representation: 

-tvT n (z)= f°£^-, zeC\R_, 
n J 1 + Xz 

where it is a probability measure. Finally, one can prove that f °° f(\)N n (d\)— f °° f(\)ir n (d\) converges 
to zero almost surely for every continuous bounded function (see [9] for details). 

V. Second order Analysis: Proof of Theorem [2] 
This section is devoted to the proof of the Central Limit Theorem: 

a-\p) (I n (p) - V n {p)) AA(0, 1), 

n— >oo 

where — > stands for the convergence in distribution. 
Outline of the proof 

Denote by ip n (u,p) = E \e m ( In ( p '~ Vn ( p '>] the characteristic function of I n (p) — V n (p). The proof is 
based on the fact that in order to establish the convergence (in distribution) of a~ 1 (p) (I(p) — V(p)) 
towards AA(0, 1), it is sufficient to prove that: 

hJu) = ij n {u, p) - e-^^P)! 2 ► 0, VneR. 

n— >oo 

In fact, recall by Proposition |2] that the sequence u/a n (p) belongs to a compact interval K u since a n (p) 
is bounded away from zero. If now h n {u) — > for every u, it converges uniformly to zero on the compact 
set Ku due to the continuity of h n . Therefore, 

h n — TT = E exp m — - e 1 > 0, 

\VniP) ) V a n{P) ) n ^°° 

which proves the CLT The proof of the convergence of h n (u) towards zero is divided into two steps. 
A. We first differentiate ip n (u,t) with respect to t in order to obtain a differential equation of the 
form: 

dip n (u,t) v? 

^7 = 7rVn{t)tpn(U, t) + E n (u, t) . (37) 

at 2 
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In order to obtain the differential equation (l37l) . we first develop dip/dt with the help of the 
Integration by parts formula (PT8l) . We then use Poincare-Nash inequality to prove that relevant 
variances are of order 0(n~ 2 ). This will enable us to decorrelate various expectations, i.e. to express 
them as products of expectations up to negligible terms. We shall then use the approximation rules 
stated in Proposition [4] in Section IIII-DI to deal with the obtained expectations. 



B. The second step is devoted to identify the variance, that is to prove the identity 

r) n (t)dt = o*(p), 



where a 2 is given by Q, i.e. a 2 (p) = — log(l — p 2 j(p) ; y(p)). 

C. The third step is devoted to the integration of (I37T ). Instead of directly integrating (I37T ). we introduce 
K n (u,p) = ip n {u, p)e~ a "^ which satisfies the following differential equation: 

^^)= £ „(„, ( )e*«'). (38) 

Taking into account the obvious facts that tp n (u, 0) = 1, cr 2 (0) = and therefore that K n (u, 0) = 1, 
we shall obtain that 

fP 2 

K n (u, p) = 1 + / e n («, t)e ^ al{t) dt , 
Jo 

and prove that f Q p e n (u, t)e~ a ^^ dt = 0{n~ v ). This will yield in turn that: 

Mu,p) = (1 + Oin- 1 )) e -V ff »(p) { = } e -4^(p) + 0(n- r ) . 

where (a) follows from Proposition 12 
The theorem will then be proved. 

A. The differential equation dtip n = —^fln^n + £n 

Recall that 4> n (u,p) = tp n (u, p)e~ iuVn ^ where <p(u,t) = E(e iu/ W). As V£(t) = nS(t)S(t) by 
we obtain: 

^ = e-^W _ iunSmmu, t) . (39) 



Since I'(t) = n _1 trH(i)YY* by we have: 

N n 



dt 



„-(H(t)^)^ 



Pi*=i i=i 



(40) 
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Applying the Integration by parts formula (fT8l ) to E [YijH p iY p je luI ~\ (which can be written E (YijT(Y)) 
for r(Y) = H pi Y~e iuI ) and using the differentiation formulas (fT4l and (031 ) yields: 



E 



Y H Y p 

1 l] 11 pi 1 pj c 



iul 



9 ' H Y~e iuI 



8Yu 



dkdjK 

n 

H djdjE 



[H yj -] HiiY pj e luI + d % dj5{i - p)E 



±± p iC 



iul 



n 



H pi Y pj [H yj ].e 



(41) 



We now sum over index i and obtain: 

Aul 



E 



P [H yj ] Y pj e iuI ] + d p djE 



M p 



iul 



H d 7 -E 

n 



[HDHyJ Y pj e 



iul 



where [3 = n x trDH. Writing = j3 + a yields: 



(1 + tadj)E 



tdjE 



iul 



+ 7^ [[HDHy^V 
We now take into account that fj(t) = (1 + tadj)' 1 and sum over j: 

E 



+ d p djE 



H pp e 



iul 



iul 



(42) 



[HYY* Jpp e 



iul 



= - tE 







HYDRY* 


e iul 


+ nad p E 


H p' luI 

lippc- 








pp 



H E 

n 



HDHYDRY* 



iul 



PP 



(43) 



By the Resolvent identity ©, E [H pp e iuI ] = E [e 



iul~\ t 



E 



HYY*] pp e iuI . Replace now in (03), 



recall that r p (t) = (1 + ta(t)d p ) and sum over p to obtain: 



E 



tr H 



YY* 



n 



Aul 



tr (DR) qE 
1 



Aul 



+ iutE 



-tr RHDH- 

n \ 



YDRY* 



Aul 



n 



t E 



0tT RH- 



YDRY* 



Aul 



n 



X1+X2+X3 ■ 



Thanks to Theorem [3l 



tr (DR) dE 



Aul 



tr (DT)dE 



Aul 



+ Oin 



n5SE 



Av.1 



(44) 



+ 0{n- 1 ). (45) 
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In order to deal with \2, we apply the results of Proposition [4] related to ^(Y) in the particular case 
where A = R and B = DR. In this case, \2 writes \2 = iutE (^(Y)e m/ ) , and Cauchy-Schwarz 
inequality yields: 



E ( $>e iuI ) - E ( e iuI ) E (*) 



E[e iu/ *] 



< 



N 



E 



0(n 



Therefore, 



-i-> 



E (^e iuI ^) = E (V u/ ) E (*) + 0{n 

We now use the approximation for E^(Y) given in Proposition @] By Theorem |3l we can replace R 
(resp. R by T (resp. T) in the obtained expression. We therefore obtain: 



E tf(Y)e 



iul 



Ef(Y)E 
1 



Aul 



+ 0(n- 1 ) 



7-tr (D 2 T 3 ) - fry-tr (d 3 T 3 ) -tr (DT 2 ) ) E 



'iul 



, , 2 , , . , ,. v ~ , ,^ . +0(n" 1 )(46) 

1 — 1^77 \ n x n V / n x / L J v ' 

The term X3 can be handled similarly: We apply the results of Proposition |4] related to 3>(Y) in the 

particular case where A = R and B = DR. In this case, X3 writes X2 = —tnK ^/5$(Y)e m/ ^ , and 

Cauchy-Schwarz inequality yields: 



E /3$e 



iul 



E I pe iuI ) E ($) 



E[/?e iu/ $] 



< WE 



2- 



'E 



o2 



Ofn" 



We therefore obtain 



E 



ptr RH 



YDRY* 



iul 



n 



(a) 



E 



E 



/3e 



kti 



/3e 



iul 



tr (D 2 TRJ -tr (DTR) + (n" 1 ) 
7 tr (DT 2 ) + O (n- 1 ) , 



(47) 



where (a) follows from Theorem [3] It remains to deal with the term E 



(5e 



iul 



rely on d43l ) and develop the term E [i/ pp e in/ ] . The Resolvent identity yields: 



. To this end, we shall 



E 



[HYY* Jpp e 



iul 



-E 

t 



Aul 



n 

-E 
t 



iul 



Plugging this equality into (l43l) and using r p = (1 + tad p ) , we obtain after some computations 



E 



iul 



t 2 E 



f3e iuI -tr RDH- 
n \ 



YDRY* 



n 



iut z 



n 



E 



tr RDHDH 



YDRY* 



Aul 



n 



n 



+ -tr (D (R-E[H]))E 

n 



Aul 



(a) ,2 -tp 
= E 77E 



0e 



iul 



1 iur 2 



n (1 — i 2 77) V rc. 



7-tr (D 3 T 3 ) - r 7 2 -tr (D 3 T 3 ) J 99 + 0(n~ty8) 
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where (a) follows from Theorem [3l Proposition [4] and Proposition [5] We therefore obtain: 



E 



1 iue 't^^-^b^))^ ^) 



n(l-t 2 77) 2 V' n 

Plugging (|48]) into (07]), and the result together with (03]) and (@6]> into (O, and getting back to (l40l) 
and (l39l) . we obtain: 

= -uVWlkM) + 0(n- 1 ) , 

where 

f 2„,U T . / , fi3^3\ 1 



, , * ' n \P T ) ^tr (DT J x ^I t r(D 3 T 3 )Itr(DT 2 ) 



(49) 

Equation (I37T ) is established, and the first step of the proof is completed. 

B. Identification of the variance 

In order to finish the proof, it remains to prove that: 

Vn(t) = where a 2 n (t) = - log (1 - t ln (t)%(t)) . (50) 

To this end, we first begin by computing the derivatives of j n (t) and 7 n (t). We shall prove that 

M !tr(D 3 T 3 Utr(DT 2 ) rfn/ -tr (D 3 T 3 ) -tr (DT 2 ) 

dn = _ 2 ^ )n_K )_ ^ = _ 2 ^A L . (51) 

dt l-t 2 77 dt l-i 2 77 v ' 

We only derive the computations being similar in the other case. We first expand the expression of 
7, and obtain: 

f 4g^4E^( TT ^) , = - 2 |(«(0)it r (D3T») . <«, 
Let us now compute S'(t): 

1 - / 1 \' 

A similar computation yields <5'(i) = — j5(t) — jtS'^t). Combining both equations yields: 

, = t^S - j5 
1 — t 2 77 

We now plug this into d52l ) and obtain: 

±tr ( D 3 T 3 ) (8 - M 



d i = ^ — '- ■ ™ 
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Recall now that the mere definition of T, T, 5 and 5 yields 

i5DT = I - T 



(55) 



t5UT = I - T 
Using (1551) . we obtain: 

n _1 tr(DT 2 ) = n _1 tr (dT (i - £<5Dt)) = 6 - t6j , (56) 
n^HrfDT 2 ) = n -1 tr (f)T (l — tSGT)) =6 — tS^ . (57) 



It remains to plug (l56l) in (l54l ) to conclude the proof of (l5lT ). 

We are now in position to prove (l50l ). The main idea in the following computations is to express d49b 
as a symmetric quantity with respect to 5 and T on the one hand and 5 and T on the other hand. To 
this end, we split n n (t) in d49l ) as rj n (t) = t— k ~ (rP^ + r/ 2 ) + r/ 3 )). We first work on t/ 3 ); 

(3) ( „) ^f^tr(D 3 T 3 ) t 4 ^7 2 7^tr (D 3 T 3 ) 
^ 1 — i 2 77 1 — i 2 77 

, H -t 2 7±tr (D 3 T 3 ) Itr/DT 2 ) -, 

= = ^ + t 7<5— tr (D T ) . 

1 — t z 77 re 

where (a) follows from (|56*1 ). and (b) from (I57T ). We now look at -q^: 



rjV) + t^shv (D 3 T 3 ) = *7 (~tr (d 2 T 3 + itr (d 2 T 2 (tiDT))jj 



t 7 7 



where the last equality follows (1551 ) again. We therefore have 



t 2 7^tr ( D 3 T 3 ) ±tr (DT 2 ) i 2 7 ±tr (D 3 T 3 ) ±tr ( DT 2 

(a) 1 i 2 77' + t 2 7'7 + 2*77 



^ = t^t^ r^T^ 



2 1 - i 2 77 

where (a) follows from (TSTb. This concludes the identification of the variance. 



C. Integration of the differential equation (1371) 

Let us introduce K n (u, p) = ip n (u, p)e~ a ™ <yp \ Due to (l37l) . K n {u,p) readily satisfies the following 
differential equation: 

= e n (u,t)e-^(t) . (58) 



Of, 
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As in Section HV-CI one can easily prove that |e n (i)| < |r f° r every t G [0,p\. As K n (u, 0) = 1, we get 

iT n (u,p) = 1+ / e n (u,t)e^<W dt . 
J o 

Due to Proposition |U er^(i) is bounded from above uniformly in n and t G [0, p]. This fact, together 
with |e n (i)| < — implies that: 

K n (u,p) = l + o(^j . 

This in turn yields 

* n (u,p) = (1 + O (n- 1 )) e -£<*W 

where the last equality follows from the fact that o^fp) is uniformly bounded by n by Proposition [2 

Appendix 

A. Proof of Proposition \J\ 

Let us first establish the existence and uniqueness of the solution of CO- To this end, we plug the 
expression of 5 in (Q]). The system of two equations reduces to the single equation 5 = f(t,6) where 
f(t, S) is defined by 

fit, 5) = — tr ( D (l + t—tr Tf3 (i + tSTt) ^ ~d\ | (59) 



n \ \ n 
which is itself equivalent to g(5, t) = 1 where 

g (t, S) = = itr (si + thr ( 5D (l + tST>) B 

The function 5 i— ► ^(i, 5) is continuous, decreasing and satisfies 0) = +oo and g(t, +oo) = 0. 
Therefore, the equation g(t, 5) = 1 has a unique solution 5(t) > 0. 

The integral representation (O of 5 and <5 is related to the Stieltjes representation of a class of analytic 
functions. One can indeed prove that functions 1 1— > 5(t) and 1 1— ► <5(i) defined on extend to C\M_, 
are analytic over this set and satisfy the system £[)) for every z G C \ R_. Relying on specific properties 
of 5(z) and we can prove that the following integral representation holds: 

<W _r°4*L and (60) 

" J, Hi! V ' J 1 + AZ 

where /j, and /} are nonnegative measures uniquely defined on M + satisfying = ^tr(D) and 

p(R+) = ~tr(D). We refer to [9] where a more general result is proven and skip the details. 
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B. Proof of Proposition [2] 

In order to prove Proposition [2l it is sufficient to first prove that 1 — i 2 77 is bounded away from zero 
and then to prove that the same quantity is strictly lower than 1, uniformly in n. We shall proceed into 
four steps. 

1) A priori estimates for 5, 6, 7 and 7: The mere definition of 5 and 5 yields: 



N 



i = Iy^<^l and s^-Y—^KdL 

n f-' 1 + tdid n n <f-f 1 + tdjS 

i=i 1 j=\ J 

Using these upper estimates, one gets the following lower estimates: 

±trD ~ itrD 



(61) 



5 > s = and 5 > j& = . (62) 

One can notice that due to Assumption (Al), these lower bound are eventually bounded away from zero. 

Finally a straightforward application of Jensen's inequality yields: 

2 

,2 _ ( - V- ,rr.. \ ^ N "f ; „ n s2 ^ „, . 12 



P -\\y d{T \ ^_ i. e . _,r<- and <S- < - . (63) 
\ n In N 

2) An estimate over The following equalities are straightforward (see for instance (l53ll): 

tf'(t) = - 7 $(i) - 7 t5' (t) and 5'(i) = - 7 5(t) - 7 i5'(i) . (64) 

In particular, \S'(0)\ = 7(0)5(0) < A r n _1 d^ iax <i max which is eventually bounded. Recall that 6 admits 
the following representation: 

where jj, is a nonnegative mesure satisfying /i(M + ) = ^trD. In particular, one obtains: 

0<-S'(t) = J™ < -~6'(Q) < iVn-^cUx- (65) 

3) 77ie quantity 1 — i 2 77 is bounded away from zero, uniformly in n and for t G [0, p]: Eliminating 
5' between the two equations in (l64l) yields: 

— (l-i 2 77 ) = 7 (i5 7 -<5) = — trDT (i5DT — I ) = --trDT 2 , 

dt n \ I n 

where the last equality follows from the identity T = (I + t^D)" 1 which yields (t5~DT - I) = -T. 
Otherwise stated: 

1 - t z 77 



a ^ _ 7 trDT^ 



n{-5'(t)) 
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This immediatly implies that 1 — i 2 77 is positive. In order to check that it is bounded away from zero 
uniformly in n, notice first that n _1 trDT 2 > d^ax7- Collecting now the previous estimates (l63l and 
(|65l ), we obtain: 



1 - t 2 77 > 



n 2 5 2 5 2 



N 2 d 2 d 2 

"max max 

Using (l62l ) and Assumption (Al), we obtain that 1 — i 2 77 is bounded away from zero, uniformly in n 

and for t 6 [0, p]. 



4) The quantity 1 — i 2 77 is strictly bounded above from 1, uniformly in n: The inequalities (1631 ) 
together with (l62l ) yield: 

sup(l-t 2 77) < sup (l - t 2 ^-S 2 S 2 ) < 1. 
This completes the proof of Proposition [2] 

C. Proof of Poincare-Nash inequality 

The proof is borrowed from [18]. Recall that £ = [£i, . . . ,£m] T is a complex Gaussian random vector 
which law is determined by 

E[£]=0, E[« r ]=0 and E[££*] = S. 

Let T = r(^i, • • • , Cmj^I) • • • >£m) be a C 1 complex function polynomially bounded together with its 
derivatives. We shall prove here Poincare-Nash inequality 

vai(r(0) < e [v 2 r(^) r s vjxo] +E[(v-r(^))* s v-r(0] , 

where V 2 T = [8T/dzi, dT/dz M ] T and V-T = [dT/dzT, . . . , dr/#zF] T . 

Let y and z be two C 2A/ -valued jointly Gaussian vectors (which parameters will be specified below). 
Consider the Gaussian vector x(t) = y/iy + \/l — iz and let T : C 2M — ► C be a given smooth function 

T = T(zi, . . .,Z2M,zi, ■ ■ ■ ,Z2m). Then 



ET (y) - ET (z) = f ^ET(x(i)) dt . 



Let V 2 T = [&T/dzi, dT/dz 2M ] T and V-T = [OT/dzJ, dT/dz^] T . Then 
^ET(x(t))=E (-^= ?=) -V,r(x(t))+ ( 



, • V z T(x(t)) + -*-= == • V-T(x(t)) 



(66) 
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At this point, assume that y = [u T ,u T ] T and z = [v 1 , w J Y where u, v and w are independent 



TYT 



iM 



valued Gaussian vectors having the same law as Moreover, put T(x(i)) = r(xi(i))r(x2(i)) where 
x(t) is partitioned as x(i) = [x^(t), X2 "(t)] T . Then 

var(r(u)) =ET (y) — ET (z) 



which leads us to consider the right hand side of Equation (I66I ). The first term there (call it xi) writes 

T 

V*T(x(t)) 



Xi 



E 



f _y z 

\2y/t 2^/Y~t 



1 



2y/i 



E 



1 



2y/T=t 



r(x 2 (t)) u T v,r(x 1 (t))+r(x 1 (t)) u T v 2 r(x 2 (t)) 
r(x 2 (t)) v T v 2 r( Xl (t)) + r( Xl (t)) w T v 2 r(x 2 (t)) 



E 



(67) 



r(x 2 (t)) u J V z r(xi(t)) . Writing u =[[/!,..., t/ M ] J and Xl (i) = [X ijl ,...,X i 



Let us process the term E 



for i = 1, 2, we have by the Integration by Parts Formula ([16 



E 



r(x 2 (t)) */„ 



E 



m=l 



9 



r(x 2 (t)) 



M 



ar( Xl (t))ar(x 2 (t)) 



prn 



E 



m=l 



5Xi 



ax 



+ r(x 2 (t)) 



a 2 r( Xl (i)) 



2,m 



9Xi >p dXi >r 



where we used Xi(i) = vtu+ \/l — Iv and x 2 (t) = \/iu + \/l — ^ w in the second equality. By treating 
similarly the other terms of the right hand side of (I67T ) and taking the sum, the terms with the second 
order derivatives d 2 / dXi )P dXi )m disappear and we end up with 



1, 



Xi 



-E 



(v z r( Xl (t))) T s v,r( X2 (t)) + (v-r( X2 (t)))* s v^r( Xl (t)) 



(68) 



where we used the identity df/dz = df j&z which proof is straightforward. 
By using twice the Cauchy-Schwarz inequality we obtain: 

v,r( Xl (t)) T s v*r(x 2 (t)) 



E 



< E 



< 



E 



v z r( Xl (t)) T s v 2 r(xi(t)) 

v,r( X2 (t)) T 3 v 2 r(x 2 (t)) 
v 2 r( Xl (t)) T sv 2 r( Xl (t))]}" 



{e [v z r( X2 (t)f s v*r(x 2 (t))]} 



The second term of the right hand side of (I68T ) can be bounded in a similar manner. Noticing that xi(i) 
and x 2 (i) have the same law as u, which does not depend on t, it results that 



Ixil < ^ 



v 2 r(u) T s v,r(u) + ^E[(v-r(u))* s v-r(u)] 
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The second term of the right hand side of Equation (1661 ) is treated similarly, which leads to the desired 
result. 

D. Proof of Theorem \3\ 

We first give a sketch of the proof to emphasize the main ideas over the technical aspects of the proof. 

1) We first prove that the asymptotic behaviour of n _1 tr (A (R — T)) is directly related to the 
behaviour of a(t) - S(t). Similarly, n _1 trA (r - is related to a(t) - S(t). 

2) We extend the definition of a from t G M + to z G C \ R_ and establish an integral representation: 

v(d\) 



JR+ 1 + At 

As a consequence of the integral representations for 5, 5 and a, we prove that 5, 5 and a are 
bounded analytic functions on every compact subset of C \ R_. 

3) As a consequence of this detour in the complex plane, we prove the following weaker result. For 
every uniformly bounded diagonal matrix A, the following holds true: 

n- 1 tr(AR) = n" 1 tr(AT) + o(l) 
n~ 1 tr(AR) = n" 1 tr(AT) + o(l) 

4) We then refine the previous result in order the get the sharper rate of convergence 0(n~ 2 ) instead 
of o(l). 

The theorem will then be proved. 

1) The asymptotic behaviour of n~ l tv ( A (R — T)) and its relation with a(t) — S(t): The standard 
matrix identity 

R — T = R(T 1 -R _1 )T 



immediatly yields 



Therefore, 



n- 1 tr(A(R-T)) = t(S(t) — a(t))—tv (ARDT) and 

n 

n'hv (6(T - R)) = 6(t) - a(t) = t(a(t) - 5{t))-tx (DRDT 



n-Hr (A(R - T)) = t 2 {a{t) - 6(t))-tv (dRDt) -tr (ARDT) . (69) 



n \ / n 
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2 ) An integral representation for a, and bounds over a, 5 and 5: Recall that a(t) = E[n _1 tr(D(I + 
£n -1 YY*) )]. This function readily extends from t G R + to z G C \ R~. Moreover, the following 
representation holds true: 



a(z) 



+oo 



u(d\) 



(70) 



1 + Xz ' 

where v is a uniquely defined positive measure on R+ such that u(M+) = ±trD. To prove this, we 
introduce the eigenvalue/eigenvector decomposition of matrix n _1 YY* = X]i=i ^i u i u i where (Aj, 1 < 
z < iV) and (iij, 1 < i < iV) represent its eigenvalues and eigenvectors respectively. The random variable 
P(z) = ^trD(I + z^^-)~ l can be written as 



N 



1 \ - u*Du, 

i=l 



o;(dA) 
1 + Az ' 



where uj is the nonnegative random measure defined by 



1 N 

- V]u*Duj(5(A - Aj) 



i=l 



Consider now the measure f defined by v = E[w], that is ^(5) = E[u;(.B)] for every Borel set B C 

It is clear that a(z) = E[/3(z)] is given by (70]>, and that i/(R+) = E[w(R+)] is given by 

N 



E 



i=l 



-Vu*Du, =E -trD(Vu iU * 



As £\ Ui< = I. " ( 

C\R- 



~trD as expected and representation (TTOl) implies that a(z) is analytic over 



Let dist(u),R + ) stand for the distance from element w G C to R + . Then the following holds true for 
every z G C\R _ : 

1 TV . 1 1 



1 ,™ 1 1 W , 1 

a(^) < -tr(D)— -. < -d mmn 1 

n '\z\ dist(-i R+) n |z| dist(-i,R+) 



Similarly, 460]) yields that 



ML 



1 



(71) 



(72) 



n|z| dist(-i,R+) ' 

A similar result holds for 5 n {z). These upper bounds imply in particular that a(z), S(z) and 5(z) are 
uniformly bounded on each compact subset of C \ R_. 
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3) A weaker result as a consequence of Mantel's theorem: We first establish that for every diagonal 
matrix A uniformly bounded, 

n -1 tr(AR) = n _1 tr(AT) + oil) 

. (73) 

n -1 tr(AR) = n" 1 tr(AT) + o(l) 
We take ( f69T > as a starting point. Matrices R, R, T, and T have their spectral norms bounded by one 
for t G M + and matrices A, D, and D are also uniformly bounded by assumption. Therefore, the terms 
n~ x tr ^ARDT^ and n -1 tr (ARDT) are also bounded. In order to prove d73l , it is sufficient to prove 
that ait) — 6(t) = o(l). To this end, we make use of Proposition [5] and write a(t) — S(t) as 

a(t) - 6{t) = -tr (D(R - T)) + e n (t) , 
n 

where e n (t) = 0{n~ 2 ) . Using relation (l69l ) for A = D, we immediately get that: 

ait) - Sit) = (ait) - 5(t))t 2 -tr (dRDt) -tr (DRDT) + e n it) . (74) 

As sup n M|Rn||, ||Rn||, ||T n ||, \\T n \\\ < I, we have: 

-tr (DRDT) — tr (DRDT) < -d 2 ma Ji ax < 2cd 2 max d 2 max 
n \ In n 

as soon as ^ < 2c. Therefore, if t < to := (2<i max (i maxV / c) _1 , then 

t 2 -tr ( DRDT) -tr (DRDT) < - 

n V In 2 

for n large enough. Eq. d74b thus implies that 

\a n (t) - 5 n (t)\ < 2\e n (t)\, i.e. a(t) - 5(t) = 0{n~ 2 ) for t < t . (75) 

This in particular implies that a n (t) — 5 n (t) = o(l) for t < to; however, it remains to establish this 
convergence for t > to- To this end, observe that a n (z) — 5 n {z) is analytic in C \ IR_ and bounded on 
each compact subset of C \ R_. Montel's theorem asserts that the sequence of functions a n (z) — 6 n (z) 
is compact and therefore that there exists a converging subsequence which converges towards an analytic 
function. Since this limiting function is zero on [0, to[ by (1751 ). it must be zero everywhere due to the 
analycity. Therefore from every subsequence, one can extract a subsequence that converges toward zero. 
Necessarily, a n (z) — 5 n {z) converges to zero for every z € C \ M~ and in particular for t > 0. This 
establishes d73l . 

Even if the convergence rate of a n it) — 5 n (t) is OirC 2 ) for t < to, Montel's theorem does not imply 
that the convergence rate of a n (z) — 5 n (z) remains 0(n~ 2 ) elsewhere. Therefore, there remains some 
work to be done in order to prove that a n (t) — S n (t) = 0(n~ 2 ) for each t > 0. 
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4 ) End of the proof: We take (1741) as a stalling point. Equations (1731 ) imply that for each t > 0, 

n _1 tr (DR(i)DT(t)) - 7 (i) = o(l) 
n~ 1 tr ^DR(t)DT(t)j - 7 (i) = o(l) 

where 7n = n _1 trD 2 T 2 and % = n~ 1 trD 2 T 2 . Thanks to Proposition [51 (1761 ) implies that 
inf (1 - t 2 -tr (D n R n (t)D n T n (t)) ^tr (5 n R n (t)5 n T n (t)) ) > . 



(76) 



n n 
Equation (T74l thus clearly implies that a(t) — 5(t) is of the same order of magnitude as e n (t), i.e. that 
a(t) — 5{t) = 0(n -2 ). Theorem [3] is proved. 



E. Proof of Proposition |4]-([T|) - Variance controls 

Consider first ^(Y) = ^tr (AH Y ^ Y ). We use Poincare-Nash inequality (fl9l) to control the variance 



of <£. It writes 



E 



$(Y) S 



N n 

i=i j=i 



<9<£> 



9K; 



'•J 



AT n 

i=i j=i 



9K; 



'•J 



(77) 



We have ^(Y) = (1/n 2 ) J2 pr =i Sn=i a p b q H pr Y rq Y pq . From the differentiation formula (fT3l) we have 

Q t 

- (H pr Y rq Y pq ) = --H pi [y*~H\ r Y rq Y pq + H pr Y pq 5(r - i)5(q - j) . 



dY 



'j 



Therefore, after a straightforward computation we obtain dQ/dYij = cj)^' + (f>^ with 



'j 



t 



1 



^ [y*HYBY*AH] . and <$> = J_ fe . [ y * AH ] 



The first term of the right hand side of inequality (1771 ) can be treated as follows: 



N n 
i=l j=l 



dYi 



N n 



i=l j=l 



'J 



+ E 



2t 2 

n 6 

+ A E 

n 4 



tr ( HYBY*AHDHAYBY*HYDY* 
tr ( AHDHAYB 2 DY H 



(78) 
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Let A = sup || A„||. Using inequalities ([7]), ([8]), (flOl) and Cauchy-Schwarz inequality, we have 



2V 



TV 



E 



tr H YBY* AHDHA YBY* H YDY' 



2t 2 
~ n 6 



< — 

~ n 6 



tr (HYBY*AHDHAYBY*H) Jtr YDY 



H|| 4 ||Af ||DL/tr ( (YBY*) 4 ) Jtr ( (YDY 



< 



< 



2d A 2 t 2 
n 2 

K 



1 

-E 

n 



tr 



YBY" 



l 

-E 

n 



tr 



YDY* 



v 



(79) 



where the last inequality is due to (TTTb . Turning to the second term of the right hand side of d78l) . we 
have 



n 4 



tr AHDHAYB DY" 



< 



2A 2 d n 



-E 



-tr ( -YB 2 DY ! 

n In 



< 



K 



n- 



(80) 



The second term of the right hand side of Inequality (1771 ) is treated similarly. This proves that var($) 
0{n~ 2 ). 



Consider now ^(Y) = ^tr ( AHDH YB ^ ) ■ The proof being quite similar to the previous one, we just 



give its main steps. By (O we have E[^(Y) 2 ] < Y^i=iH]=i d i d j {Md^ / dY^ 2 } + E[\d^ /dY id \ 2 ]) . 
A computation similar to above yields d^/dY^ 



^ij + ^ij + ^if wnere 



(i) 

ij 



v£ 2) 

Y13 



rv 



[y*HDHYBY*AH] 
[y*HYBY*AHDH] 



n- 



;bj [y*AHDHl 



We have 

N n 
i=l j=l 



dY, 



N n 



i=l j=l 



E 



(i) 



+ E 



4>. 



(2) 



+ E 



3*' 
n 6 

H — -E 



(3) 



tr HDH YBY* AHDHA YBY* HDH YDY" 



n 



6 



r E 



n 1 



tr ( H YBY* A (HD) HA YBY* H YDY" 

3 tt a vd2i 



tr A (HD) HA YB DY" 
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The first two terms of the right hand side can be bounded by a series of inequalities similar to inequalities 
d79l ). The third term can be bounded as in (l80l ). This ends the proofs of the variance controls in Proposition 

H 



F. Proof of Proposition 0-(f2l) - Approximation rules 

Consider first $(Y) = itr(AH^). we write $(Y) = (1/n 2 ) £^ = i E"=i a A E [^p^W] 
and apply the Integration by parts formula (TT8T ) to the summand. Using identity (fl4l ). we have 



E [YijHpiYpf] = didjE 



9 'h y-) 



-—didjE 
n 



[Hy •] ff«y M - + didjSi? - p)E [flj 



By taking the sum over the index i, we obtain E 



/J [Hy^yJ + d p d j E[H pp ). 



Writing now (3 = (5 + a and then grouping together the terms with E [Hy^-j Y p j 



, we obtain: 



E 



-tdjfjE 



P [ny 3 ] p Y PJ 



+ dpdjfjE [H. 



vvi 



We now sum over j and p, and obtain 

E 

with 



1 / YBY' 
-tr AH 

n V n 



-tr fDRB) -tr (AD E [H]) + e , 

n V / n 



E 



°1 / YDRBY* 

/9-tr AH 

n \ n 



-t E 



% 1 / 1T x YDRBY * 

/3-tr AH 

n \ n 



Applying Cauchy-Schwarz inequality, Proposition [3] and the variance controls in Proposition HJ we get 

\e\=0{n- 2 ). 

By Theorem |3 ra^tr ^DRB^ = n'Hr (f)TB) + 0(rr 2 ). By Theorem [3] and Proposition B we 
obtain n _1 tr (AD E [H]) = n _1 tr (ADT) + 0(rT 2 ). This ends the proof of (l22l 

Consider now *(Y) = ±tr (AHDH^Il) . In order to compute E$(Y), we shall need the following 
intermediate result: 

Lemma 1: In the setting of Theorem [Q let T(Y) = ±tr (DHDH). Then 

1) The following estimate holds true: 



var [T(Y)] = O 



1 



2) moreover, 



E [T(Y)] 



7 



1 — t 2 77 
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Proof: In order to prove Lemma [BO. we use the Resolvent identity (O and write: 

DHDH = DHD - tin 'DHDHYY* . 



Since vai(X + Y) < 2var(X) + 2var(Y), we only need to deal with each term of the right handside. 
By Proposition |3J var(n _1 tr DHD) = 0{n~ 2 ) and by Proposition HMD, var(tn _2 tr DHDHYY*) = 
0{n~ 2 ) and the proof of Lemma [j]-© is completed. 
Let us now prove Lemma [I]-©. The Resolvent identity © yields: 

YY* 



E 



HDH 



d p E [Hpp S 



tE 



HDH- 



n 



pp 



(81) 



We then write E 



[HDH 



YY* "I 

n ipp 



n 1 J2k,i=i TTj=\ dkHpkHkiYijYpj, and apply the differentiation 



formula ( fT3T ) to the summand. After derivations similar to (I4TT - I421 ). we obtain: 



-E 



[HDHy^-j Y l 



pj 



t 
n 
t 
n 
1 



djfjE 



djfjE 



H — dpdjfjE 



[ Hy 4^n tr(DHDH) 
HDH1 



ipp 



Taking the sum over j and combining with (1811) yields: 

YDRY* 



E 



[HDH 



pp 



t 2 r p E 



+t 2 r p E 



H 



n 



1 



tr (DHDH) 







HDH 



pp 

YDRY* 

n 



pp 



+r p d p E [H } 



(82) 



(83) 



Taking now the sum over p, we obtain: 



where 



E 



tr (DHDH) 



N 



^d p E 



P =i 



[HDH 



pp 



X1+X2 + X3 



Xi 



X2 



X3 



t 2 E 



t 2 E 



-tr ( DRH YDRY * j —tr (DHDH) 

n \ n In 



1 / YDRY* 
-tr DRHDH— — — 

n \ n 



n 



tr (D 2 RE [H]) . 



(84) 
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Let us first deal with the terms X2 and X3- Cauchy-Schwarz inequality together with Proposition [3] and 
Proposition |4]-([T|) yield \2 = 0(n~ 2 ). Proposition [5] together with Theorem [3] yield X3 = 7 + 0(n~ 2 ). 
We now look at xi- Due to Proposition |4]-([T]) and to Lemma [I]-©, we have: 



Xi 



t 2 E 



-tr [ 1 

n \ 



DRH- 



YDRY* 



n 



( a ) + 2 ~w 

= t 77E 



-tr (DHDH) 

n 



+ o[ — 



E 



1 



-tr (DHDH) 

n 



+ o\- 2 



where (a) follows from (l22l ) in Proposition 01 It remains to plug the values obtained for xi> X2 and X3 
into d84l) to obtain: 



(1 - t 2 77)E 



1 



tr (DHDH) 



7 + 0(rT 



Recalling Proposition |2j we can divide by (1 — i 2 77) and obtain the desired result. ■ 
We can now go back to the computation of E^(Y). Let us give the main steps of the derivation. 
Expanding E^(Y) yields: 



E 



1 / VRV' 
-tr AHDH 

n \ n 



N n 



p=i j=i 



We replace the summand n X E [HDHyj] p Y p j\ by the expression given by (l82l) . We then replace the 



term E 



fHDH 



vv 



in (1821 ) by the expression given by (1831 . We sum over p and j and notice afterwards 



that the terms where (3 is involved are of order 0(n 2 ). We therefore end up with: 



E 



1 



n 



tr AHDH- 



YBY H 



n 



-t E 



-tr (DHDH) -tr [ AH 

n n \ 



YDRBY* 



n 



+— tr ( DRB ) E 

n 



-tr (DHDH) -tr ARDH 



YDRY* 



n 



n 



n 



1 



1 



+-tr DRB -tr AD REH + O 



n 



We first decorrelate by using the variance estimates in Proposition HI-© and Lemma [j]-® and obtain: 



E 



1 



71 



tr AHDH- 



YBY" 



11 



n 



tr (DHDH) 



-t E 

-t 2 -tr (DRB)e 

n 





"l / 


E 


-tr 




n \ 



YDRBY* 



-tr (DHDH) 

n 



E 



n 



-tr I ARDH 

n 



YDRY* 



n 



+-tv (vTLB) -tr (AD 2 REH) + O ( 4r J 

n V / n \n z J 



It remains to apply Theorem [3l Proposition [4] and Lemma [T]-© to the terms in the right hand side of 
the previous equality to conclude. 
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